Model Selection

Low-Latency Inference

# Low-Latency Inference

NeuroBERT-Mini is a lightweight natural language processing model derived from google/bert-base-uncased, optimized for real-time inference on edge and IoT devices.

Large Language Model

A multilingual audio classification model based on speechbrain/lang-id-commonlanguage_ecapa, supporting identification of 5 Indian languages

Audio Classification Supports Multiple Languages

Japanese Reranker Tiny V2

This is a very compact and fast Japanese reranking model, suitable for improving the accuracy of RAG systems and can run efficiently on CPUs or edge devices.

Text Embedding Japanese

Japanese Reranker Xsmall V2

This is a very compact and fast Japanese reranking model, suitable for improving the accuracy of RAG systems.

Text Embedding Japanese

Qwen2.5 VL 72B Instruct FP8 Dynamic

FP8 quantized version of Qwen2.5-VL-72B-Instruct, supporting vision-text input and text output, optimized and released by Neural Magic.

Transformers English

Gemma 3 4b It Int8 Asym Ov

Gemma 3 4B parameter model optimized with OpenVINO, supporting text-to-text and visual-text inference

Faster Distil Whisper Large V3.5

Distil-Whisper is a distilled version of the Whisper model, optimized for Automatic Speech Recognition (ASR) tasks, offering faster inference speeds.

Speech Recognition English

Faster Distil Whisper Large V3.5

A CTranslate2 format model converted from Distil-Whisper large-v3.5 for efficient speech recognition

Speech Recognition English

RWKV7 Goose World3 2.9B HF

The RWKV-7 model adopts the flash linear attention format, supports multilingual text generation tasks, and has a parameter count of 2.9 billion.

Large Language Model Supports Multiple Languages

Phi 4 Multimodal Instruct

Phi-4-multimodal-instruct is a lightweight open-source multimodal foundation model that supports text, image, and audio inputs to generate text outputs, with a context length of 128K tokens.

Multimodal Fusion

Transformers Supports Multiple Languages

Pixtral 12b Quantized.w8a8

INT8 quantized version based on mgoin/pixtral-12b, supports vision-text multimodal tasks with optimized inference efficiency

Transformers English

Qwen2.5 VL 7B Instruct Quantized.w8a8

Quantized version of Qwen2.5-VL-7B-Instruct, supporting vision-text input and text output, optimized for inference efficiency through INT8 weight quantization

Transformers English

Lb Reranker 0.5B V1.0

The LB Reranker is a model for determining the relevance between queries and text snippets, supporting 95+ languages, suitable for ranking and reranking in retrieval tasks.

Large Language Model

Transformers Supports Multiple Languages

Kotoba Whisper Bilingual V1.0

Kotoba-Whisper-Bilingual is a distilled model collection trained from the Whisper model, specifically designed for Japanese and English speech recognition and speech-to-text translation tasks.

Speech Recognition

Transformers Supports Multiple Languages

Layerskip Llama2 7B

An improved model based on Llama2 7B, supporting hierarchical skip and self-speculative decoding to enhance inference efficiency

Large Language Model

Transformers English

Llama 3 Firefunction V2

FireFunction V2 is a state-of-the-art function calling model with a commercially viable license, trained on Llama 3, supporting parallel function calls and strong instruction following.

Large Language Model

Llm Compiler 13b

LLM Compiler is an advanced LLM based on Code Llama, specifically designed for code optimization and compiler reasoning tasks

Large Language Model

Tinyagent ToolRAG

TinyAgent is a small language model (SLM) designed for edge devices, focusing on function calling and complex reasoning capabilities, providing privacy protection and low-latency services.

Large Language Model

Transformers English

Hiera Base 224 In1k Hf

Hiera is a hierarchical vision Transformer model that is fast, powerful, and concise. It surpasses state-of-the-art performance in a wide range of image and video tasks while significantly improving runtime speed.

Image Classification

Transformers English

Codegemma 1.1 7b It

CodeGemma is a lightweight open-source code model series based on Gemma, specializing in code generation and dialogue tasks.

Large Language Model

Distil Whisper Large V3 German

A German speech recognition model based on distil-whisper technology, with 756 million parameters, achieving faster inference speeds while maintaining high quality.

Speech Recognition

Transformers German

Ragas Critic Llm Qwen1.5 GPTQ

The Ragas Evaluation Model is part of the Ragas synthetic test data generation pipeline, serving as an alternative to GPT-4 for evaluation tasks.

Large Language Model

explodinggradients

Distil Large V3

Distil-Whisper is a knowledge-distilled version of Whisper large-v3, focusing on English automatic speech recognition, offering faster inference speeds while maintaining accuracy close to the original model.

Speech Recognition English

Faster Distil Whisper Medium.en

This is a version of the distil-whisper/distil-medium.en model converted to CTranslate2 format for efficient speech recognition tasks.

Speech Recognition English

Faster Distil Whisper Large V2

This is a distilled version of the automatic speech recognition (ASR) model based on the Whisper architecture, designed for efficient inference and suitable for English speech-to-text tasks.

Speech Recognition English

Multilingual E5 Small Optimized

This is the quantized version of multilingual-e5-small, optimized for inference performance through layer-wise quantization while retaining most of the original model's quality.

Text Embedding Supports Multiple Languages

Xlm Roberta Base Language Detection Onnx

This is the ONNX format conversion of the papluca/xlm-roberta-base-language-detection model, designed for multilingual text classification tasks, supporting detection in 20 languages.

Text Classification

Transformers Supports Multiple Languages

Replit Code V1 5 3b

A 3.3B-parameter causal language model specialized in code completion tasks, supporting 30 programming languages

Large Language Model

Transformers Other

Bge Large En V1.5 Quant

Quantized (INT8) ONNX variant of BGE-large-en-v1.5 with inference acceleration via DeepSparse

Transformers English

MIT Ast Finetuned Speech Commands V2 Ov

This is an OpenVINO-optimized version converted from MIT/ast-finetuned-speech-commands-v2, designed to accelerate inference operations for voice command recognition tasks.

Audio Classification

Transformers English

Efficientformer L3 300

EfficientFormer-L3 is a lightweight vision Transformer model developed by Snap Research, optimized for mobile devices to achieve low latency while maintaining high performance.

Image Classification English

Mobilenet V1 0.75 192

MobileNet V1 is a lightweight convolutional neural network designed for mobile devices, balancing latency, model size, and accuracy in image classification tasks.

Image Classification

Mobilenet V1 1.0 224

MobileNet V1 is a lightweight convolutional neural network designed for mobile and embedded vision applications, pre-trained on the ImageNet-1k dataset.

Image Classification

Mobilenet V2 1.0 224

MobileNet V2 is a lightweight vision model optimized for mobile devices, excelling in image classification tasks.

Image Classification

Mobilenet V2 1.4 224

A lightweight image classification model pre-trained on the ImageNet-1k dataset, specifically optimized for mobile devices

Image Classification

T5 Small Openvino

OpenVINO IR format version of the T5-small model, supporting text generation, translation, and other tasks

Large Language Model

Transformers Supports Multiple Languages

Mobilenet V2 1.4 224

MobileNet V2 is a lightweight convolutional neural network designed for mobile devices, excelling in image classification tasks.

Image Classification

Mobilenet V2 1.0 224

MobileNet V2 is a lightweight convolutional neural network designed for mobile devices, excelling in image classification tasks.

Image Classification

Mobilenet V1 1.0 224

MobileNet V1 is a lightweight convolutional neural network designed for mobile and embedded vision applications, pretrained on the ImageNet-1k dataset.

Image Classification

Ms Marco MiniLM L2 V2

A cross-encoder model trained on the MS Marco passage ranking task for query-passage relevance scoring in information retrieval.

Text Embedding English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase